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Abstract. We show that Lasso and Bayesian Lasso are very close when the sparsity is 
large and the noise is small. Then we propose to solve Bayesian Lasso using multivalued 
stochastic differential equation. We obtain three discretizations algorithms, and 
propose a method for calculating the cost of Monte-Carlo (MC), multilevel Monte 
Carlo (MLMC) and MCMC algorithms. 
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1. Introduction 

Let y = Ax + aw be the classical linear regression problem see e.g. [31] and the 
references herein, (see also HHEIE] for some new applications). Here p and n is a 
couple of positive integers, y G M” are the observations, tc G is the unknown signal 
to recover, w G M"" is the standard noise, a is the size of the noise and A is a known 
matrix which maps the signal domain into the observation domain The matrix 
A is in general ill-conditioned (e.g. in the case n < p) which makes difficult to use the 
least squares estimate. Penalization is a popular way to compute an approximation of x 
from the observations y. The general framework proposes to recover the vector x using 
the posterior probability distribution function proportional to 

f , . IIAa; — ?/|p\ 

exp {-P(x) - j . 

Here || • || denotes the Euclidean norm. This requires to dehne a penalization P to 
enforce some prior information on the signal x. The term reflects Gaussian 

prior on the noise w. The parameter cx^ > 0 reflects the noise level. 

The C penalization is the sum of the absolute values P{x) = a;||a;||i of the 
components of ax. The parameter a > 0 reflects the sparsity level of the variable 
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X. The Lasso := argmin{a||a;||i + , x G M^} was first introduced in [21]. It is 

also called Basis Pursuit De-Noising method [8]. It was introduced to induce sparsity 
in the variable x. A large number of theoretical results has been provided for the 
penalization see e.g. P[I11E3] and the references herein. 

We will suppose that a = 2(3 and <7^ = ^. It follows that the posterior PDF is 
equal to 

-^exp{-2(]F{x)), (1) 

where 

fM^||^||.+ (2) 

and Zjs is the partition function, i.e. 

Zfs= exp ( — 2/3F{x))dx. 

Jm.p 

Bayes estimator of x is equal to 

r (jn[» 

:= / xexp{-2^F{x)) — . (3) 

Jrp ^0 

Lasso is the maximum a posteriori estimator 

Lasso = argmin{F(a;) : x G W}. (4) 

In the sequel Xp will denote a random vector having the probability distribution Q. 
Hence Bayes estimator ([^ is the mathematical expectation 

ElXjJ. (6) 

In the hrst part of this work we show how Bayes estimator converges to Lasso as 
(3 —)■ +CXD. In the second part we consider for hxed (3 the random vector Xp as the limit 
of a multivalued stochastic process {x(T)) (Langevin diffusion with non-smooth drift) 
as T —)• -|-oo. We propose to approximate Bayes estimator E[W^] by the mathematical 
expectation E[a;(T)] for large T. We obtain three discretizations algorithms. Two 

among them are known as unadjusted Langevin algorithm (ULA) ([22|) and STMALA 
f|15ji. We calculate the latter mathematical expectation E[a;(T)] using Monte Carlo 
(MC), Multilevel Monte Carlo (MLMC) and MCMC methods. We propose a method 
for calculating the cost of MC, MLMC and MCMC. 

2. Lasso estimator properties 

First, we need some notations. For each x G the sub-differential sgn{x) = cl||a?||i is 
the set of the column vector ^ G such that the component = sgn{xi) = 1 if > 0, 
= sgn{xi) = —1 if < 0 and G [—1,1] if x, = 0. 

We will denote, for each subset J C {!,...,p} and for each vector v G 
v{J) = {v{i) i E J) G Here |J| denotes the cardinality of J. The notation 
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V < w means v{i) < w{i) for alH = 1,2,... ,p. The scalar product is denoted by {■, ■), 
and (cj : i = 1,2,... ,p) denotes the canonical basis of 

Now we recall a well known properties of Lasso estimator see e.g. | 32 |. 
lemma: The vector x{ii) is a minimizer of the map x —)■ F{x) = ||a:||i + 
if and only if the vector 

i:= A*{y - Ax{y)) e sgn{x{y)). (6) 

The vectors Ax(y) and the /^-norm ||a:(y)||i are constant on the set of Lasso estima¬ 
tors. Moreover, the set of Lasso is convex and compact. Here A* denotes the transpose 
of the matrix A. 


We introduce the sets 

I ={ze{l,...,p}: |e*|<l}, (7) 

dl = {$ e {1,... ,p} : iCil = 1}. (8) 

Observe that the support {i G {1,..., p} : Xi(y) ^ 0} of any Lasso x{y) is contained in 
9J, and / is contained in the set {i G {1,... ,p} : Xi{y) = 0} of the null components 
of x{y). For each subset J of {!,... ,p}, Aj denotes the submatrix of A having its 
columns indexed by J. 

From ’’equation (§” it is easy to show that the injectivity of Aqi implies the 
uniqueness of Lasso. In fact, under this hypothesis the system 

idi = ^diV ~ ^di-^diXdi{y) 

has a unique solution. As the support of any Lasso x{y) is contained in dl, then Lasso 
is unique. 

In the sequel for each x G 

7i{x) = argmin{||a: — x{y)\\ : x{y) G Lasso}. 
prop: The random positive number \\Xp — 7r(A_a)|| converges to 0 in probability as 

/S —>• -l-CXD. 


proof: The proof is similar to Theorem 4.1. in [T]. It works as following. 

Let 5 > 0, and p > 0 such that 

inf{F(a:) : \\x — 7i{x)\\ > 6} > M{p) = sup {F{x) : ||a: — 7r(a:)|| < p}, 

where F is given by ’’equation ([^”. We have 

I\\a^-.ia^)\\>5^M-PF{x))dx 


F(||A^-7r(A^)|| >5) = 


< 


J exp{—(3F{x))dx 

I\\a.-A-)\\>5 ^(V))dx 

I\\a,-H-)\\<r, exp ( - - M{p))dx ■ 


From the estimate 


11®—7r(a;)||>5 


exp ( — (3{F{x) — M{p)))dx < 


II®—7r(®)|| >5 


exp ( — {F{x) — M{y)))dx < -f-cx) 
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and the bounded convergence theorem, the numerator exp(—/3(F(a;) — 

M{ri))dx —)■ 0 as /3 —>■ +cxd. The denominator 


exp { — (3{F{x) — M{ri))dx > / dx. 

11®—7r(ic)||<?7 \\x—F{x)\<r) 


It follows that 


P{\\Xp- 7 r{X 0 )\\ >S)< ^^- ^0 




x—7r(x)\\<r) 


dx 


as /d —)■ + 00 . 


Now we are interested in the speed of convergence of X^ — 7r(X^) —)■ 0 as /3 —>• +oo. 
The first step of this convergence is based on the following. 

Prop: Let x{y) be any Lasso estimator and m = F{x{y)) be the minimum of the 
objective function F{x) ’’equation Q”. The function F{x) — m is equal to 

II A(a: — a:(^))|P 

^ |a:i|(l - s^n(a;i)^i) + ^- - -• (9) 

i=l 

And then 

p 

^ |xi|(l - sgn{xi)^i) = ^ IxiKl - sgn{xi)^i) + 
i=l i£l 

2 Y1 (“) 

i£dl:sgn{xi)^i=—l 

Here ^ is defined by ’’equation and dl are defined by ’’equation 0”, and 


’equation (10) 


Proof: From the equality || Aa: — y|p = || A(a: — x{y))\\^ + 2{A{x — x{y)), Ax{y) — 
y) + IIAa:(^, t) - y\\^, we have 

F{x) = 

\\A{x — x{y))P / \N * / N \ l|Aa:(y) —y|P 

lla^lli + ^- 2 -^ ^^(y) -y) + ^- 2 - 

= lla^lli + ^- 2 -^ + {^- ^(y), A {Ax{y) - y)) + ^- - -, 

From the equality ^ = A*{y — Ax{y)) , we have 

{x - x{y), A*{Ax{y) - y)) = - {x - x{y),0 

= - + l|aj(2/)lli- (11) 

Now formulas ’’equation 0” and ’’equation (10)” are an easy consequence of the formula 
’’equation (lull”. 

Now, we are interested in the asymptotic independence of the components (A^(f) : 
z G /), (A^(z) : i G dl) as /d —)■ +cxd. We are going to solve this problem when A*qjAqj 
is invertible. In this case Lasso is a singleton {a:(y)}. 

The support of x{y) is S' = {z : Xi{y) p 0}. The complementary of S is 
Jo = {z : Xiiy) = 0}. The boundary of OIq = {z : Xi{y) = 0, |^j| = 1}. The family 
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(5, Jo \ dlo, dio) is a partition of {1, 2,... ,p}. In the seqnel is considered as the set 
of the seqnences {xi : i E (Jq \ OIq) U OIq U S) indexed by (Jq \ OIq) U OIq U S. The 
notation will denotes the set of the seqnences (xj : j E J) with values in M. 

Observe that I = Iq \ dlo ’’equation ([^”, and dl = S U dlo ’’equation (10)”. 
For i E S and for Xi near Xi{y), we have sgn{xi) = In this case the equality 
’’equation ([Io|)” becomes 

\xi\{l - sgn{xi)^i)+ 2 \xi\. (12) 

ieIo\dIo iedIo.sgn(xi)^i=—l 

Now we decompose as following. Each partition dl^ U dlQ of dlo dehnes two 

sets 


A- := A(aJo-) 

= {a; G : Xi^i =— 1 ,\/i E dl^}, 

A+ := A(a/o+) 

= {a; G : Xi^i = i E dl^}. 

We have 

RP = IJ A” n A+. 

dloUdl+=dlo 

It follows that for each suitable function / 

E[/(^/3)] = E[/(^/3) \x^EA-n A+]P(X^ G A- n A+). 

dIoUdI+=dlo 

The main result of this section is the following. 

prop: We have for each partition K~ U = dlo with K~ ^ 0 that 
P(X^ G X{K-) n X{K+)) ^0 as ^ +cx). 


proof: We suppose without loosing any generality for all i G dlo that 
’’equation ([^”, we have for large (3 that 


P(X^ G A(JJ-) n A(JJ+)) ^ 


A(/3,^,JJ+,JF-) 

'n,dl+yjdlQ=dlo dl^.dlQ )’ 


where 6 is small and 


1. From 


A(/?,5, 9/(+,aJo ) 
G{x) 


' [a;eA“nA+,||a:—a:(y)|| oo <^] 


exp ( — (3G{x))dx, 


\xi\{l - ^isgn{xi)) + 2 \xi\ + 

i€:Io\dIo iGdiQ 

||A(a; - x{y))f 
2 


We recall that by hypothesis K 7 ^ 0, but in the denominator the sum X]a/+u 9 /y= 9 /o 
contains the case dl^ = 0 . 
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We use the new variables 

Ui = /3xi, i e Io\ dlQ, 

Vi = - Xi{y)), ieSUdlQ, 

and then we obtain 


where 


with 


C{fi,s,ai*,dii)= j 

J[-5p,of‘o X[-5/3,5/3]-fo\S-rox [-5^,6^]^ 

exp (y—G{u, V, 13, OIq, dl^)) dudv, 

G{u,v,l3,dl^ ,01^) = ^ \ui\{l - iisgniui)) + 2 ^ \ui\ + 


ieioWo 


i&dir. 


ll"^SU9/+'^5ua/+ + ^^‘^-^Io\dI+'^Io\dI+\\'^ 


Observe that G{u, v, (3, dl^, OIq ) converges to 

G{u,v,dI^,dlQ) = ^ \ui\{l - iisgn{ui)) + 


i&h\dIo 


o I I ll"^su9/+'^5ua/+lP 
2 2 ^ \u^\ +-^ 

iGOIq 

as /3 —)■ +CX), and then G{(3, 6, OIq, OIq) converges to the following positive constant 

c(di*,dip) 

J ( —OO,0]^'^0 X ( —00, + Oo)(-^o\^-^o)U‘S' X (0, + oo)^'^0 

exp [—G{u, V, SIq , OIq)) dudv 

as /3 —)■ + 00 . By observing that |9/(^| = \dIo\ is the minimizer of 

l„.| ^ ,„| _ hid , M. 

it follows that for K~ ^ 0, 

A{f3,6,K+,K-) 


'^dl+uaiQ=dIo 1^1 1 ^^0 ) 

converges to 0 as /3 —)■ +C)0. 

As a consequence we derive that as (3 —)■ +cx), 

P(X^(*)6 = l,VzeaJo)^l, 

and then we get the following. 


Theo: [TO] If A*qjAqj is invertible, then the components 

{l3Xii{i),i e Io\ dio), (v^(X^(i) - Xi{y)) : i e S U die 
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are asymptotically independent as /3 —>■ +cxd. Their asymptotic PDF are proportional 
respectively to 


JJ exp - IxiKl - 
i&Io\dIo 


exp 


||Aguajo(a; - x{y))sudio\\‘^ 
2 


3. Bayesian Lasso and multivalned diffnsion 


First we solve rigoronsly the following stochastic differential eqnation 

dx = —[d||a;||i + A*(Aa; — y)]dt + dw, (13) 

where w is the standard Brownian motion. Second we show that the solntion of 
’’eqnation (13)” is ergodic with the stationary probability density ’’eqnation (Q” with 
/? = !. 


3.1. Yosida approximation 

Let (p : IRP —)■ (—oo,+oo] be a proper l.s.c. convex fnnction, and V(W) be the set of 
snbsets of The snb-differential dtp is the map from —)■ V{W) dehned by 

dp>{x) = {n G : p>{x + h) > p>{x) + (h,n), Vh, G 

The domain 

Dom{d(p) = {x : d(p{x) ^ 0}. 

A seqnence of single valned approximations for the snbdifferential d(p{x) is based 
on Yosida approximation. For each e > 0 and z G W, the eqnation 

X = z + edip{z) 

has a nniqne solntion denoted by 
z = (I + ed(p)~^{x) 

:= proXe^{x). 

The map prox^ip : —>■ Dom{d<p) is called proximal fnnction. The Yosida 

approximation of the snb-differential d(p is the application 

The following are well known see e.g. 1211 . 
prop: We have 

(i) prox^p, is a contraction from to Dom{dip). 

(ii) /?£ is monotone on the whole i.e. 

{/3e{xi) - /3 ^{x 2), Xi - X 2 ) > 0 , 

for all Xi,X 2 G and is Lipschitz continnons with the constant A 






(iii) For every x G MP, /3e{x) G dip{proXsip{x)). 
prop: For each e > 0, the map 

a: G —)■ ^ei^) = mm.{ip{z) + 

is called the Yosida approximation of the function p. We have 

(i) is convexe with the domain 

(ii) is of class with = (Ps- 

(iii) The inhmum dehning (pe{.x) is attained at proXe^{x), and 

^s{x) = ^\\P,{x)f + p,{proXe:^{x)). 

(iv) Letting e | 0, we have (fs t for all x G 
In the case (p{x) = ||a:||i, we have 

proXa^{x) = {x + q;) 1 [ 3 ,<_„] + {x - a) 1 [„,>„], 


and 


r 

p^{x) = minj ^ \zi\ + 

i=l 


2e 


min {\zi\ + 


\Zi -xA 


2e 


E 

i=l 

2=1 


The gradient 


Vip,{x) = I3,{x) 

X 

S^?r(a:)l[|a;|>e] T ~l[|a:|<£]- 

Finally 

EX 

pTOX(y^^p^{x^ {x T o) 1 j3.<_Q,_j] T ^ ^ ^l[|3;|<a+£] T C>i')'^[x>a+e]- 


3.2. Multivalued stochastic differential equation 

Now, we come back to Multivalued stochastic differential equation. Let w be the 
standard Brownian motion on ISP and b : MP ^ MP he a smooth map. A solution 
of the M^'.jYiultivalued stochastic differential equation (abbreviated MSDE) 

dxt = —d(p{xt)dt — h{xt)dt + dwt (14) 

is a couple of continuous adapted stochastic processes t G [0,+cx)) —)■ {x(t),l(t)) with 
values in MP x MP, and such that l{0) = 0, t —)■ l{t) has bounded variation on each 
compact interval and 

dxt = — 


dlt — b{xt)dt + dwt, 
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and € d(p{x{t)y\ i.e. the measure {xt — at,dlt — jdtdt) is non-negative for all 

continuous trajectory t —)■ {at, Pt) such that fdt G dip{at). Observe that if dlt = I'tdt, 
then G d^{xt). 

It’s known that if 


||h(a;i) - b{x2)\\ < C\\xi - X2\\, Xi,X2, 

||h(a;)|| < 0(1 + ||a;||), Va;, 

then there exits a unique solution {x,l). See e.g. [6],[7], [5], [20], |1], 


It follows 


that ’’equation (13)” has a unique solution {x,l). In general the measure dlt is not 
absolutely continuous with respect to the Lebesgue measure dt. However we are going 
to show that dlt is absolutely continuous in the case ’’equation (13)”. We recall two 
methods for constructing the solution x of ’’equation (13)”. 

1) By choosing ^p{x) = ||a;||i, b{x) = A*(Aa; — y), then the solution of 
’’equation (13)” is the unique couple {x,l) of continuous maps such that l{0) = 0, 
t ^ l{t) has bounded variation on each compact interval and 

dl{t) 


dx{t) = —[dlt + A*{Ax{t) — y)dt] + dwt, 


dt 




(15) 


II Aaj—-yll 


2) By choosing (p(a;) = ||a;||i -|- 2 

’equation ([I^” is given by the couple {x{t),k{t)) such that 

dx{t) = —dk{t) + dw{t), k{t) G d(p{x{t)). 


b{x) = 0, then the solution of 


The uniqueness of the solution of ’’equation (13)” implies that dk{t) = dlt + 
A*{Ax{t) — y)dt. Now, we are going to show that I is absolutely continuous. For 
this aim we recall Skorokhod problem [7|. Let / be any continuous function from 
[0, T] —)■ and —)• M be any convex function. Then there exists a unique couple 

{x, k) of continuous maps such that fc(0) = 0, f —)■ k{t) has bounded variation on each 
compact interval, 

x{t) = f{t) - k{t), Vt>0, (16) 

and the measure (x(t) — a(t), dk{t) — (3{t)dt is nonnegative for all continuous trajectory 
t —)■ {a{t),(3{t)) such that (3{t) G d'ip{a{t)). Now we are ready to announce our result, 
prop: Suppose that 


m = sup{||n| 


V G 


U 


( 17 ) 


xeRp 


is hnite. Then the function I solution of Skorokhod problem ’’equation (16)” is abso¬ 
lutely continuous. 


proof: Let e G such that ||e|| = 1, 7 > 0 and v G dijj{'je) having the smallest 
Euclidean norm. As {x, k) is the solution of Skorokhod problem, then 

{x{t) — ye, dl{t)) > {x{t) — ye, vdt) 

> -m (||a:(t)|| -h y) dt. 
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For each 0 < s < t, we have 


{l{t) - l{s),e) = / {e,dl{u)) 


= 7 / {xiu),dl{u)) - 'y / {x{u) -'ye,dl{u)) 


ti+1 


<7 ^ / {x{u),dl{u)) + rri'y ^ I \\x{u)\\du + m{t — s). 
Js Jti 

From the latter inequality and 

||Z(t) — Z(s)|| = sup{(Z(t) — Z(s), e) : e G ||e|| = 1 }, 

and by tending 7 —)■ +C)0, we get 

Which achieves the proof. 


By choosing f{t) = Xq — A*(Ax(s) — y)ds] + Wt, we derive that {x,l) 


equation (15)” is the solution of Skorokhod problem. As the hypothesis ’’equation (17) 


is satished for i/j{x) = ||a;||i, with m = 1, then I is absolutely continuous. Finally the 
solution of ’’equation (|I^” satishes 

a;(t) = a;(0) — f [t;(s) + A*(Aa;(s) — 7/)](is + (18) 

Jo 

and v(t) G cl||a;(f)||i, ||r^(t)|| < 1, dt a.e. Moreover we can show that a.s. for z = 1,... ,p 


that Xi{t) 7 ^ 0 and Vi{t) = sgn{xi{t)), dt a.e. The ’’equation (18)” becomes 


where 


dx{t) = 2 ^^^^ {p{x{t)))dt + dw{t), 


p(x) := — exp( - 2 ||a:||i - HAa: - yf] 


(19) 


( 20 ) 


The equation ’’equation (19)” is known as distorted Brownian motion [18] with the 

generalized Schrodinger operator 

11 ^ 1 

H = --A - +Trace{A*A)^ + -\\sgn{x) + A*(Aa; - z/)f. 

i=l 

Here A is Laplacian operator and 6 denotes the Dirac measure at 0. 


3.3. Transition probabilities in the one dimensional case 

In the one dimensional case 

dx{t) = —Xsgn{x(t))dt + dw(t), a:(0) = xq, A > 0 

is known as bang-bang Brownian motion [25] , or the diffusion with V potential [2^ . In 
this case Schrodinger operator has the form 
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The transition probabilities p^{x, t \ xo, 0) of the bang-bang Brownian motion is known 
[3] . We can calcnlate it nsing Girsanov Formnla, and the trivariate density of Brownian 
motion, its local time and occnpation times (|I9]). We obtain 

p^{x, 1 1 Xo) = q^{x, 1 1 xo)A exp(—2A|x|) 


where 


t\xo) = exp (A(|xo| + |x|) 


tX 


Xt - (|x| -f |xo|) \ 


^ p,{x-x,) + F{^ ^ 


F{x) 


lt{u) 


exp{-\) 


du, 


exp{-^) 


\/2tTl 

Observe that p^{x,t \ xo,0) —>■ Aexp(—2A|x|) as f —)■ -|-cxd for all xq. Hence, the MSDE 
dx(t) = —Xsgn{x(t))dt -|- dw(t) 
is ergodic with the invariant density Aexp(—2A|x|). 


4. Sampling using multivalued SDE 


As we said before, the solution {x{t)) of ’’equation (13)” is ergodic. It follows that 
lim'T^+oo has the probability distribution p ’’equation (20)”. If we dispose of a 


trajectory t G [0,T] —)■ Xt for large T, then for any p-integrable function h, 


- / h{xt)dt^^h{x{T))] 


h{x)p{x)dx. 


'RP 


Hence for large T the expectation E[a;(T)] of the solution ’’equation (13)” is close to 
Bayes estimator ’’equation ([^”. We will approximate E[a;(T)] using numerical schemes 


of ’’equation (13)” and the timestep 


iXti = 2"'T, 


( 21 ) 


with the level I = 1^,1^ + .... In all the sequel the small level Ig '■= + 1. 

Having a numerical scheme {xl{sc, k) : k = 1,..., 2^) such that E[a3^.(sc, 2^)] —)■ 
E[a;(T)] as L —)■ -|-oo, we need to calculate K[xl{sc, 2^)] for large L. To achieve this goal 
we use Monte Carlo (MC) and multilevel Monte Carlo (MLMC) algorithms. We will 
discuss the efficiency of MC and MLMC estimates. We will mimic the results obtained 
in [27] for Coulomb collisions, and propose a method for calculating the cost. 


5. MC Efficiency and computational cost 

Given a sample {xf{sc, 2^) : k = 1,..., Ni) of xi{sc, 2^) having the size Ni, we define 

1 

xf'‘{sc^2^) = — (sc, 2^), I, andW are fixed. 


k=l 


( 22 ) 
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xi{sc,2’‘) ■=¥\xi{sc,2^)] = lim xfHsc,2^), 

x{T)-.='E[x{T)] = lim xi{sc,2’'). 

1^+00 


(23) 

(24) 


We recall that MC proposes to estimate Xi{sc,2'') ’’equation (23)” by ;rf^'(sc, 2*) 
’’equation 


If we estimate x{T) ’’equation (24)” by xf^‘{sc, 2*), then the error has two sources. 
The approximation of E[a;(T)] by E[a;i(sc, 2^)], and a hnite sampling error that depends 
on the number of samples Ni. 

An accurate estimate x^'^{sc, 2*) of x{T) is one for which the mean square error 

j|i(r)-*«K2')||=( 

= ||a;(T) - xi{sc, 2^)||^ + E ||a;i(sc, 2^) - xf\sc, 2')|p 
is small. We have 

Vari{sc) 


MSE = E 


E 


\xi{sc, 2^) — xf'‘{sc,2 


-)(M|2 


N, 


where 


Vari{sc) = y^Var{xi^i{sc, 2^)). 


Z=1 


Here xi^i{sc,2’‘) is the Ath component of xi{sc,2^) and Var{xi^i{sc,2^)) its variance. 

The quantity 

||;r(T) - xi{sc,2^)\\‘^ = e{sc,Ati) (25) 

is a function of the timestep At/. It is central in the computational cost and we suppose 
that is known. 

The estimate xf‘{sc,2^) is accurate to within ri‘^ of x{T) if 


MSE = E 

= e(sc, At/) + 


4(r)-*f''(s<;,2')||2 

Var A sc) 


= V 


N, 


(26) 


The computational cost K of obtaining {xf{sc, 2^) : /c = 1,..., A/) is the product of 
the number of timestep ^ = 2^ and the number of samples A/. Namely, 

A(A/,At/) = A/^ = A/2b 

At/ 

To make the scheme as efficient as possible, K must be minimal subject to the constraint 
’’equation ([2^”. Applying the method of Lagrange multipliers 

T 

k 

we get the optimal choice 


L{Ni,Ati,X) = + A (^e(sc. At/) + 


VarAsc) n 

- V 

A/ ' 


T Vari{sc)_ 

















( 27 ) 


e(sc,AU)^^^=,f. 

jV; 


It follows that 


d e(sc, Ati) rf — e(sc, At;) 


(28) 

(29) 


At, At, 

e(sc,At,) <7]^. 

We propose to solve the latter system numerically as follows. In all the sequel we 
estimate x(T) by xl{sc, 2^) with L = 16. Hence we obtain the following approximation: 

\\xl{sc, 2^) - xi{sc,2^)\\‘^ ^ e{sc,Ati). (30) 

Second 

d e(sc, At;) e(sc, At;) — e(sc, At;+i) 


At; 


r2-'-i 


The ’’equation (28)” becomes 

3e(sc, At;) — 2e(sc, At;+i) rf. 
Now we calculate for I > Ig the quantity 
3e(sc, At;) - 2e(sc, At;+i) 
until it becomes close to jf and 
e(sc, At;) < rf. 

Having /, we calculate Varfsc) by 


P N 


N 


2=1 k=l k=l 


(31) 

(32) 

(33) 


Having I and Varfsc) we calculate the optimal sample size A; using the ’’equation (27)’ 
and then we derive the optimal cost Ki. 


6. MLMC Efficiency and computational cost 

Multilevel Monte Carlo (MLMC) was initially developed for hnancial mathematics [16] , 
Cl and now used in a disparate areas. 

Multilevel Monte Carlo considers multilevels. In our study we consider the levels 
I = Igfs + 1,... ,lrn < L = 16. The smallest level Ig is choosen such that Af;^ = 
We generate a sample {x^^{sc,2’“‘) : k = 1,...,A;J of size A;^ of xifsc,2^^), and for 
each / = Zs + 1,..., we generate from the same underlying stochastic path and initial 
conditions the samples {xi{sc, 2^) : k = 1,..., A;) and {xi_-^^{sc, 2^“^) : k = 1,, A;) 
respectively of xi{sc,2^) and a;;_i(sc, 2^“^). Moreover, the samples {xffsc,2''‘) : k = 
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1,..., iViJ, (a;f (sc, 2'-),xf_^{sc, 2' : /c = 1,..., iV«) for / = /^ + 1, ..., Im have to be 

independent. Using the telescoping sum 

= a;z^(sc,2'*) + ^ (^^(sc, 2*) - &z_i(sc, 2'-^)), 

/=/s + l 

MLMC proposes the estimate 

of a;«„(sc,2'-) := E[a;z^(sc, 2'-)]. 

We introduce for each level I and sample size Ni the following notations: 
x^‘{sc, 2*) — :r^^(sc, 2^“^) := 6xf‘{sc, 2^). 

It follows that 

Im 

Im 

:= *z?'(sc,2'0 + (5i;f'(sc,20, (34) 

where 6x^’^{sc,2^) := xf‘{sc,2'') — x^_^-^{sc,2^~^). 

An accurate estimate a;^"*(sc, 2^"*) of x{T) is one for which the mean square error 


MSE := E 


J:r(T) -*2™ 2'™)II^J = ||*(T) - a;z„(sc, 2''")f + Uar(a;2™ (sc, 2'™)) 

is small. 

If we set Us = Var{xi^{sc, 2’"’)), and for / = /* + 1,..., Z^, 

Vi = Var(Sxi(sc,2‘y}, 


(35) 


then 


Uar(*2™ (sc, 2'-)) = 


l = ls 


N, 


and 


MSB = ||i(T) - x,j8c, 2'")||= + Z 


l=ls 


N, 


If 


\x{T) -xi^{sc,2 


\ 11 2 


:= else 


T r 

l = ls 


(36) 


then efficiency of MLMC is equivalent to minimize 

I'm Im 




l = ls 


l = ls 


T 

Ab’ 


( 37 ) 
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under the constraint ’’equation (36) 


We estimate for < Z < L = 16, \\x(T) — Xi{sc, by \\xl{sc, 2^) — Xi{sc, 2^)||^ 
and then we are interested in the set 1{t]) of levels I such that 

^2 


e(sc, Ati) = \\Xl{sc^ 2^) — Xi{sc, 2 




For each lopt G l{p), the ’’equation (36)” becomes 

lopt 

Vi 


=v‘^ -e{sc,Atiopt). 


1=1. 


(38) 


Having the optimal lopt, the minimization of K ’’equation (37)” under the constraint 


(38) is solved by Lagrange multiplier 


Hence 


ajv,(7s: +A(^ + -e{sc,Atiopt))) = 0, I = h,... Jopt. 

l=ls + l 




A^ 2 ) ^ Ig,... ,lopt, 


lopt 


^ ^ ^ V At/opt)* 


I — ^s + 1 


N, 


It follows for I = Ig,..., lopt, that Ni = y/XVi2~’-, and then 
Y = V - e(sc, Atiopt). 

l = ls 

Having lopt, we estimate and {Vi : I = Ig + I, ■ ■ ■, lopt) by 

p N 

h = EvEi< 


N 

i=l k=l 
p ^ N 


X 


i.k 12 


i=l k=l 


Hence for I = Ig,..., lopt 


772 _ g(- 5 c, Atiopt) 




/c- la 


Now, we are going to present our schemes. 


(39) 

(40) 

(41) 


7. Semi-implicit Euler schemes 

Numerical approximation has been tackled in 0 . 1211 . H. lai. see also 


Semi- 


implicit Euler scheme (SIES) of ’’equation (14)” is given by 

xi{k + 1) - xi{k) = -Vp{xi{k + l))Af; - b{xi{k))Ati + \^Atin{k + 1), 
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where (n(fc +1) : fc = 0,1,..., 2^ — 1) is a sequence of i.i.d. standard Gaussian vectors. 
Known Xi{k) and n{k + 1), we have 


xt{k + 1) ^ proXAtiv(^xi(k) - b(xi(k))Ati + \/^tn{k + 1)V 


(42) 


The weak and the strong convergence propertie of the scheme ’’equation (42)” to 


the solution ’’equation (14)” are defined respectively in terms of 
eU^ti) = \\E[x{T) - xi{2%\, 
es{Ati) = E\\\x{T)-Xi{2^)f 


(43) 

(44) 

(45) 


From ([5]) the strong error ’’equation (44)” is estimated by 

By setting (p{x) = ||a;||i, b{x) = A*(Aa; — y), the scheme ’’equation (42)” is known as 
STMALA algorithm ([I5]). 


8. Explicit Euler scheme 

8.1. Algorithm EESl 


By setting <.p{x) = ||a;||i + ^^d h = 0, EESl of (13) is given by 

Xi{k + 1) = proxMiip{xi{k)) + A/^n^+i. 

The proximal prox^tiipix^^'^) is not computable, but for large I, we have 
proxAtiifix) proxj^tiW-h (* + A*(l/ - Ax)Ati 
Finally we get 

xi{k + 1 ) = proxAtiW-w 

known as PULA algorithm 


+ A*{y - Axi{k))Ati'^ + (46) 


8.2. Algorithm EES2 

By setting (p{x) = ||a;||i, and b{x) = A*(Aa; — y), we obtain our new scheme 

xi{k + 1) = proxAti\\.\\i{xi{k)) + A*{y - Axi{k))Ati + (47) 


9. Numerical implementation 

As an illustration we consider the case p = 10, n = 7 and the entries of the matrix A 
are independent Bernoulli random variables with values ±;^, and w ~ 7V(0, |l„). 
We simulate the vector x{true) from the PDF exp(—2||a;||i). We get the data 
y := Ax{true) + w from a realization of A and w. The time horizon T = 10 the 
maximal level L = 16 and the smallest level Ig = 5. 










9.1. Graphics of Trajectories of each scheme 

For each scheme sc and for each level I = ls,ls + 1,/^ + 2, we plot the trajectories 
k G [0, 2^] —)■ Xi{sc, k). For the largest level L = 16 we plot only the hrst component. 
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Figure 3. The chains of SIES, EESl and EES2 for I = Is + 2. 



X 10 



X 10 



X 10 


Figure 4. The first component of the chains SIES, EESl and EES2 for I = 16. 


9.2. The Cost of each sheme using MC 


We approximate for each scheme x{T) by xl{sc, 2^) with L = 16, and we look for the 
optimal level lopt and the optimal sample size Nopt such that 

Nopt 


MSE := E 


\\E[xl{sc,2^)' - 


Nopt 




k=l 


= 9 
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We need for / = , L — 2 to calculate e(sc, At;) := ||E[a;i(sc, 2^)] — E[a;;(sc, 2^)]p. 

Using Monte-Carlo with N = 1000, we obtain by 

N N 

e{sc,Ati) ^ II —^a;^(sc,2^) - — ^ a;f (sc, 2')||l 

k=l k=l 

Table 1 shows the numerical values of e(sc. At;) for each scheme and for / = 5,..., 13. 


l 

5 

6 

7 

8 

9 

10 

11 

12 

13 

e(SIES,AtP 

0.0050 

0.0080 

0.0071 

0.0022 

0.0054 

0.0066 

0.0056 

0.0043 

0.0022 

e{EESl,AtP 

0.0380 

0.0025 

0.0069 

0.0043 

0.0016 

0.0039 

0.0027 

0.0032 

0.0022 

e(EES2,AtP 

0.0107 

0.0041 

0.0044 

0.0042 

0.0054 

0.0111 

0.0041 

0.0048 

0.0065 


Table 1. Numerical values of e(sc, At;) for each scheme and for t = 5,..., 13. 


By fixing rj'^ > max(e(sc. At;), sc = SIES, EESl, EES2,l = 5,..., 13), the 
constraint e(sc. At;) < holds for each level I = 5,..., 13. The optimal level lopt 
is such that 3e(sc, At;) — 2e(sc, At;+i) p‘^. Having lopt we calculate 

p 

Variopt(sc) := y^Var{xiopt^j (sc, 2^°^*)), 

i=l 

f 1 ^ I 1 " 2 

“ E]vZ|A«K2'°'’')-^EA«K2'“'’') . 

i=\ k=l k=l 

and we derive the optimal Nopt{sc) = • 

The Figure 5 shows how to hnd graphically the optimal level lopt. 

We summarize for the three schemes in the Table 2 the values of lopt, Nopt and 
their cost. The scheme SIES has the lowest cost. 



lopt 

Nopt 

Cost 

SIES 

7 

70 

8938 

EESl 

7 

81 

10427 

EES2 

10 

83 

85035 


Table 2. Optima level and cost of MC for each scheme. 


9.3. Computational cost of MLMC 

In the Figure (6) for each scheme we plot I —e(sc. At;) ’’equation (25)”. We derive 
graphically the optimal level lopt- 
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Figure 5. Graphical determination of lopt for the schemes SIES, EESl and EES2. 



Figure 6. Graphical identification of lopt for each sheme. 


We summarize for the three schemes in the Table 2 the values of lopt, 
NiXopt), ■ ■ ■, Niopt{opt) and their cost. Like MC method the scheme SIES has the lowest 
cost. 

N.B. For each lopt, the optimal sample sizes are N^-iopt ■= N^{opt), ..., Niopt{opt), e.g. 
for the scheme SIES lopt = 6 and W-e = 74, 20. 
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lopt 

Ns-ioptiopt) 

Cost 

SIES 

6 

74 20 

3639.18 

EESl 

7 

132 40 16 

8962.85 

EES2 

10 

167 59 23 9 4 2 

18029.47 


Table 3. Optimal level and cost of MLMC for each scheme. 


10. Markov chain Monte Carlo method MCMC 


Using the ergodicity we suppose that the PDF of x{T) is approximated by p{x) = 
Z~^ exp ( — 2||a;||i — || Aa; — For the error 7f hxed the cost of MCMC is the sample 
size N such that 


E 


N 




k=l 


V 


Here MCMC is a trajectory of the Markov Chain Monte Carlo having the target p. 

We recall how MCMC works. Let k —?• MC{k) be a Markov chain having the 
transition probability density 7r(a;2 I > 0 for all Xi,X 2 G We construct from MC 
a new Markov chain k —)• MCMC{k) having the transition probability 


aTT{x2 I xi)dx2 + (1 - a)6x^{x2) 


where 


a = min 


/ p(a;2)7r(a;i |a;2) \ 

V ’ p(a;i)7r(a;2 \xi)J' 


The new Markov chain MCMC is ergodic and has p{x) as its invariant probability 
density function. We propose the Markov chains MC{k) := xiopt{sc,k) for 
sc = EES1,EES2 and MC{k) = RW{k,a^). Here RW{k,a^) denotes 
the Gaussian random walk, each step has the variance cr^. We obtain three 
MCMC chains: MCMCproAEESl), MCMCprox{,EES2), MCMCrw Observe that 
MCMCprox{EESl) is known as PMALA [22]. Table 4 shows the cost of each method. 


10.1. Computational cost of MCMC 


In the table 4, we indicate the different costs of MC, MCMCprox and MCMCrw 
We create for each A, M MCMC chains {MCMC^{k) : k = 1,..., A, z = 1,..., M). 
We approximate E ||E[!i:(r)] - i MCMC(*:)P by 5 ||E[!i:i(sc, 2'-)] - 


Table 4 shows that the MCMCrw corresponding to the proposal distribution 
A/'(0, 0.3) is the winer. But it loses against MLMC with the scheme SIES (see Table 2). 

Concluding remark. In this work we studied the approximation of Bayesian 
Lasso using MC, MLMC and MCMC methods and three schemes Semi-implicit Euler 
scheme (SIES), and two Explicit Euler schemes EESl and EES2. Furthermore, we 
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Cost (MC) 

Cost {MCMCprox) 

Cost (MCMCrw) 

Cost (MCMCrw) 

EESl 

10427 

5340 

3990 ( 0-2 = 0.3) 

17230 ( 0-2 = 0.8) 

EES2 

85035 

6200 

3890 ( 0-2 = 0.3) 

16230 ( 0-2 = 0.8) 


Table 4. Cost of MC, MCMCprox and MCMCnw for EESl and EES2 schemes. 


proposed a method for calculating the cost of each method and each scheme. We showed 
that the winner is MLMC with the scheme (SIES). 
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